Representing and reasoning about uncertainty is crucial for autonomous agents acting in partially observable environments with noisy sensors. Partially observable Markov decision processes (POMDPs) serve as a general framework for representing problems in which uncertainty is an important factor. Online sample-based POMDP methods have emerged as efficient approaches to solving large POMDPs and have been shown to extend to continuous domains. However, these solutions struggle to find long-horizon plans in problems with significant uncertainty. Exploration heuristics can help guide planning, but many real-world settings contain significant task-irrelevant uncertainty that might distract from the task objective. In this paper, we propose STRUG, an online POMDP solver capable of handling domains that require long-horizon planning with significant task-relevant and task-irrelevant uncertainty. We demonstrate our solution on several temporally extended versions of toy POMDP problems as well as robotic manipulation of articulated objects using a neural perception frontend to construct a distribution of possible models. Our results show that STRUG outperforms the current sample-based online POMDP solvers on several tasks.
translated by 谷歌翻译
In this paper, we examine the problem of visibility-aware robot navigation among movable obstacles (VANAMO). A variant of the well-known NAMO robotic planning problem, VANAMO puts additional visibility constraints on robot motion and object movability. This new problem formulation lifts the restrictive assumption that the map is fully visible and the object positions are fully known. We provide a formal definition of the VANAMO problem and propose the Look and Manipulate Backchaining (LaMB) algorithm for solving such problems. LaMB has a simple vision-based API that makes it more easily transferable to real-world robot applications and scales to the large 3D environments. To evaluate LaMB, we construct a set of tasks that illustrate the complex interplay between visibility and object movability that can arise in mobile base manipulation problems in unknown environments. We show that LaMB outperforms NAMO and visibility-aware motion planning approaches as well as simple combinations of them on complex manipulation problems with partial observability.
translated by 谷歌翻译
我们介绍了ThreedWorld(TDW),是交互式多模态物理模拟的平台。 TDW能够模拟高保真感官数据和富裕的3D环境中的移动代理和对象之间的物理交互。独特的属性包括:实时近光 - 真实图像渲染;对象和环境库,以及他们定制的例程;有效构建新环境课程的生成程序;高保真音频渲染;各种材料类型的现实物理相互作用,包括布料,液体和可变形物体;可定制的代理体现AI代理商;并支持与VR设备的人类交互。 TDW的API使多个代理能够在模拟中进行交互,并返回一系列表示世界状态的传感器和物理数据。我们在计算机视觉,机器学习和认知科学中的新兴的研究方向上提供了通过TDW的初始实验,包括多模态物理场景理解,物理动态预测,多代理交互,像孩子一样学习的模型,并注意研究人类和神经网络。
translated by 谷歌翻译
A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations that improve the function of a known protein. We introduce a sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models, such as protein language models, and supervised models that predict protein function from sequence. By composing these models, we aim to improve our ability to evaluate unseen mutations and constrain search to regions of sequence space likely to contain functional proteins. Our framework achieves this without any model fine-tuning or re-training by constructing a product of experts distribution directly in discrete protein space. Instead of resorting to brute force search or random sampling, which is typical of classic directed evolution, we introduce a fast MCMC sampler that uses gradients to propose promising mutations. We conduct in silico directed evolution experiments on wide fitness landscapes and across a range of different pre-trained unsupervised models, including a 650M parameter protein language model. Our results demonstrate an ability to efficiently discover variants with high evolutionary likelihood as well as estimated activity multiple mutations away from a wild type protein, suggesting our sampler provides a practical and effective new paradigm for machine-learning-based protein engineering.
translated by 谷歌翻译
Score based approaches to sampling have shown much success as a generative algorithm to produce new samples from a target density given a pool of initial samples. In this work, we consider if we have no initial samples from the target density, but rather $0^{th}$ and $1^{st}$ order oracle access to the log likelihood. Such problems may arise in Bayesian posterior sampling, or in approximate minimization of non-convex functions. Using this knowledge alone, we propose a Monte Carlo method to estimate the score empirically as a particular expectation of a random variable. Using this estimator, we can then run a discrete version of the backward flow SDE to produce samples from the target density. This approach has the benefit of not relying on a pool of initial samples from the target density, and it does not rely on a neural network or other black box model to estimate the score.
translated by 谷歌翻译
Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.
translated by 谷歌翻译
数据是现代机器学习系统的命脉,包括音乐信息检索中的命脉(MIR)。但是,MIR长期以来一直被小型数据集和不可靠的标签所困扰。在这项工作中,我们建议使用生成建模打破这种瓶颈。通过使用室内合奏的结构化合成模型(在URMP上训练的MIDI-DDSP)的结构化合成模型,通过管道说明(在巴赫合唱上训练的椰子)模型,我们演示了一个能够生成无限量的逼真的合唱音乐的系统,其中包括丰富的结合音乐,包括混合,包括混合,,,包括混合,茎,MIDI,笔记级性能属性(Staccato,Vibrato等),甚至是细粒的合成参数(音高,振幅等)。我们称此系统为室内集合发生器(CEG),并使用它来生成来自四个不同腔室合奏(cocochorales)的大型合唱数据集。我们证明,使用我们的方法生成的数据改善了音乐转录和源分离的最新模型,并且我们均发布了系统和数据集作为MIR社区未来工作的开源基础。
translated by 谷歌翻译
培训低级的深层神经网络,即使用分解层,特别是社区感兴趣的:它在记忆消耗和训练时间方面提供了对未分离培训的效率。先前的工作集中在预训练的网络的低级近似值和低级空间中的培训中,并提供了其他目标,为所选实践提供了各种临时解释。我们分析了在实践中运作良好的技术,并通过对诸如GPT2之类的模型进行广泛的消融,我们提供了证据表明该领域的共同信念,这暗示着令人兴奋的研究机会仍然需要回答。
translated by 谷歌翻译
在“封闭设置”场景中的评估之外,在呈现虹膜识别的演示攻击检测(PAD)中的研究基本上已经转移,以强调概括培训数据中不存在的演示攻击类型的能力。本文提供了几项贡献,可以理解和扩展开放式虹膜垫的最先进。首先,它描述了虹膜垫迄今为止最权威的评估。我们已经为此问题策划了最大的公共可用图像数据集,该数据集从先前由各个组发布的26个基准中绘制出来,并在本文的期刊版本中添加了150,000张图像,以创建一组450,000张代表正宗Iris和7的图像演示攻击工具的类型(PAI)。我们制定了一项保留的评估协议,并表明封闭式评估中的最佳算法在开放集情况下在多种攻击类型上都会显示出灾难性的失败。这包括在最新的Livdet-IRIS 2020竞赛中表现良好的算法,这可能来自以下事实:Livdet-IRIS协议强调隔离图像而不是看不见的攻击类型。其次,我们评估了当今可用的五种开源虹膜呈现攻击算法的准确性,其中一种是本文新近提出的,并建立了一种合奏方法,该方法以大幅度的利润击败了Livdet-IRIS 2020的获胜者。本文表明,当训练期间所有PAIS都知道时,封闭设置的虹膜垫是一个解决问题,多种算法显示出非常高的精度,而开放式虹膜垫(正确评估)尚未解决。新创建的数据集,新的开源算法和评估协议可公开使用本文的期刊版本,提供了研究人员可以用来衡量这一重要问题的进度的实验文物。
translated by 谷歌翻译
面部图像合成已经超出了人类可以有效区分真实面孔和合成产生的面孔的进展。最近开发的合成面部图像探测器具有“比人类更好”的判别能力,尤其是那些在模型训练过程中受到人类感知智能的指导的能力。在本文中,我们研究了这些人类引导的合成面探测器是否可以帮助非专业人类操作员在合成图像检测的任务中与没有人类施用的模型相比。我们进行了一项大规模实验,对1,560多个受试者进行了分类,该试验是否显示出真实或合成生成的面部,并注释支持其决策的区域。总共收集了3,780张独特面部图像的56,015个注释。所有受试者首先检查了没有任何AI支持的样品,然后给出了(a)AI的决定(“合成”或“真实”),(b)类激活图,说明了模型对其决策的显着性,或(c) AI的决定和AI的显着性图。合成面是由六个现代生成对抗网络产生的。该实验的有趣观察结果包括:(1)接受人类实力训练的模型为人类对面部图像的检查提供了更好的支持,与传统上使用跨凝性损失训练的模型相比,(2)向人类提出的二进制决策提供了比显着性更好的支持。地图,(3)理解AI的准确性有助于人类增加对特定模型的信任,从而提高其整体准确性。这项工作表明,尽管由机器支持的人类实现了合成面部检测的准确性,但向人类提供AI支持和建立信任的方式是决定人类串联的高效性的关键因素。
translated by 谷歌翻译